Eyes Wide Open - Optimising Cosmological Surveys in a Crowded Market 
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Optimising the major next-generation cosmological surveys (such as SNAP, KAOS etc..) is a 
key problem given our ignorance of the physics underlying cosmic acceleration and the plethora of 
surveys planned. We propose a Bayesian design framework which (1) maximises the discrimination 
power of a survey without assuming any underlying dark energy model, (2) finds the best niche 
survey geometry given current data and future competing experiments, (3) maximises the cross- 
section for serendipitous discoveries and (4) can be adapted to answer specific questions (such as 'is 
dark energy dynamical?'). Integrated Parameter Space Optimisation (IPSO) is a design framework 
that integrates projected parameter errors over an entire dark energy parameter space and then 
extremises a figure of merit (such as Shannon entropy gain which we show is stable to off-diagonal 
covariance matrix perturbations) as a function of survey parameters using analytical, grid or MCMC 
techniques. We discuss examples where the optimisation can be performed analytically. IPSO is 
thus a general, model-independent and scalable framework that allows us to appropriately use prior 
information to design the best possible surveys. 



I. INTRODUCTION 

Our almost total ignorance of the source of cosmic ac- 
celeration has provided the dark, damp conditions ideal 
for the spawning of wild and varied theories. Among 
many others, cosmic acceleration has been ascribed to a 
modification of gravity on large scales [1], macroscopic 
quantum effects (e.g. [2]), condensates [3,4], unified dark 
energy [5], a late-time phase transition [6] or a 'mundane' 
scalar field with almost flat potential. Unfortunately per- 
haps, the least informative possibility - Einstein's great- 
est blunder; a cosmological constant - is still a good fit 
to current data [7]. 

This zoo of possibilities highlights the profound re- 
ordering of our view of cosmology and high-energy 
physics that will follow from understanding the true na- 
ture of dark energy. This exciting prospect has stim- 
ulated the proposal of a spectacularly wide variety of 
dark energy experiments for deployment over the next 
two decades. 

Not surprisingly these experiments currently operate 
on a mutually competitive basis. As a result there has 
been little or no consideration given to how to opti- 
mally configure these surveys in order to get the best, 
model-independent constraints on dark energy models ei- 
ther from each survey alone or in conjunction with the 
other planned surveys. It is the aim of this paper to 
begin to address these important issues by presenting a 
framework which we call Integrated Parameter Space Op- 
timisation (IPSO). IPSO implements survey design that 
is model- independent, flexible and uses prior information 
within the framework of Bayesian optimal design. IPSO 
has been implemented numerically using Markov Chain 
Monte Carlo (MCMC) methods in [8] and in optimising 
the design of the KAOS/GWFMOS instrument. 

A cursory look at the descriptions of most next- 
generation dark energy experiments would suggest that 
their aim begins and ends with nailing down the two 



numbers entering the simple expansion w(z) — wq + wiz, 
of the dark energy equation of state. The truth is very 
different. 

In order for dark energy experiments to give us new 
knowledge about high-energy physics we need to know 
a great deal more than just the low-redshift evolution 
of w(z). Knowing the mass of the dark energy particle 
(equivalently its Compton wavelength [3]), its speed of 
sound [9] and its couplings to baryons and dark matter 
[10] may all prove crucial. Extracting all this information 
will require the full range of next-generation experiments 
and beyond. From this perspective, mutually optimising 
the next-generation experiments to maximise our knowl- 
edge is not only prudent, it is crucial. 

Current and proposed dark energy surveys fall into 
one of a number of categories. There are tests sensi- 
tive to the background dynamics of the cosmos, such 
as distance (luminosity or angular-diameter) tests. Pri- 
mary next-generation experiments in this category are 
the SNAP satellite [30] and the KAOS baryon oscilla- 
tion galaxy survey [11,22-24]. There are also Hubble 
constant tests (such as KAOS) [11,25] and galaxy and 
cluster number count surveys either using galaxies (such 
as DEEP2) [26] or clusters detected through X-ray emis- 
sion or the Sunyaev-Zel'dovich effect, such as ACT, SPT, 
SZA, DUO, DUET [27] and AMiBA [28]. 

Additional powerful constraints come from the CMB 
which tests both dark energy dynamics and perturba- 
tions [7], while VISTA, LSST [33] and pan-STARRS [34] 
will provide powerful new arenas to search for CMB-LSS 
correlations, expected from the acceleration-induced late 
ISW effect, and weak gravitational lensing. The lat- 
ter will also be probed by SNAP [29]. The proposed 
Dark Energy Survey (DES) [12] would simultaneously ob- 
tain weak lensing, cluster number counts and supernovae 
data. Further on, the Square Kilometer Array (SKA) 
will provide excellent constraints on distance and Hubble 
constant from weak lensing and the baryon oscillations 
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in the matter power spectrum [32]. Then there will be 
unexpected sources of help. For example, SKA will map 
the reionisation of the universe, thereby providing inde- 
pendent constraints on r, the optical depth to Thompson 
scattering. Since this is degenerate in the TT spectrum 
of the CMB with the epoch of acceleration, breaking this 
degeneracy may significantly improve constraints on dark 
energy dynamics [7]. 

However, given that many of the experiments listed 
above are currently lacking a final science definition, a 
crucial question is "What is the optimal survey struc- 
ture (niche) for each of these experiments given the other 
experiments?" For example, for redshift surveys which 
measure some quantity as a function of z (which will be 
the main focus of this paper), what distribution of red- 
shift bin error bars (or equivalently how much observing 
time should be spent in each redshift bin) should one 
aim to achieve to get the best constraints on dark en- 
ergy? Should one concentrate on a small redshift range 
covering a large area or should one cover a wide range of 
redshifts in a narrow, pencil-beam survey? All this has 
to be addressed despite us being fundamentally ignorant 
about the properties of the dark energy. 

Clearly these are difficult questions. First, one does 
not know in advance which experiments will actually be 
funded and secondly it is possible that our understand- 
ing of the universe will undergo further shocks and reve- 
lations hence and designing surveys that are robust and 
sensitive to the unexpected is desirable. 

This paper presents a framework for optimising any 
survey (not just a dark energy redshift survey) in a man- 
ner which is flexible and easy to adjust if new 'compet- 
ing' experiments are introduced, thereby allowing a clear 
niche to be found, a niche which can be touted to funding 
agencies, rather than relying on minimal improvements 
to errors on w$, w\. Further, it does not assume a model 
for the dark energy and it can be automatically adjusted 
to allow for optimal answering of specific questions. 

Currently, optimisation of surveys in the context of 
dark energy is at a rather basic stage, primarily perhaps 
because of our ignorance of dark energy, as discussed ear- 
lier. By contrast, optimisation for CMB experiments is 
rather well understood (see e.g. [36,37]). Previous dark 
energy survey analysis have typically fallen into two cat- 
egories: (1) optimisation of survey properties [f3,16,15] 
to best estimate parameters assuming constant w (often 
w = —1) and (2) comparison of error estimates for pa- 
rameters in a small number (less than six) of specific dark 
energy models [22,17,14]. 

But, except for the proposal in [13], no precise criterion 
for selecting one survey geometry over another has been 
given. More importantly perhaps, none of these analyses 
were model-independent and hence were limited for two 
reasons: first they all selected an underlying dark energy 



model * and second, the underlying models chosen typ- 
ically belonged to a very limited dark energy parameter 
space, 0, such as w — constant or wq,wi as appearing 
in the simple expansion w{z) = wq + w\z. 

Unfortunately, such an approach induces significant 
bias when used with real data [38] and does not suffi- 
ciently allow for our ignorance of the underlying model. 
Our proposal (IPSO) is the natural generalisation of this 
general ethos to a Bayesian optimal design context which 
allows for model-independence, a specific criterion for op- 
timisation and inclusion of prior and competing survey 
data. 

In this paper we follow the Einstein summation con- 
vention, so repeated indices are summed over.t We in- 
terchangebly use index and index-free forms of vectors 
and matrices, e.g. both F and refer to the expected 
Fisher matrix. 

Section (II) introduces the framework, section (III) dis- 
cusses various Figures of Merit (FoM) while section (IV) 
discusses issues related to the actual survey optimisa- 
tion. Section (V) discusses simple examples of optimisa- 
tion and the issues one faces in full implementations. 
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FIG. 1. Flow-chart for integrated parameter space opti- 
misation (IPSO). For each survey geometry in a denumer- 
able set (either discrete or continuous) some function of the 
projected parameter covariance matrix is integrated over the 
dark-energy parameter space to yield a Figure of Merit 
(FoM, a positive real number). Optimisation proceeds by se- 
lecting the candidate survey geometry with the minimum or 
maximum FoM, depending on the precise FoM used; see sec- 
tion (III). 



II. THE BASIC FRAMEWORK FOR IPSO 

The flow-chart for the Integrated Parameter Space Op- 
timisation (IPSO) framework is simple, endowing it as a 
result with considerable flexibilty. Consider a set of al- 
lowed survey geometries, indexed by s. As we will discuss 
below, they can depend continously on survey parameters 
(such as survey volume or area) or can be discontinuous 
proposals for different survey geometries or even survey 
types. 

For example, one geometry may include a completely 
different component (such as the weak lensing survey 
added to SNAP, follow-up observations for SNIa in a 
lensing survey or adding a high-redshift component to 
a number counts survey). For each survey geometry, s, 
we compute an appropriate Figure of Merit (FoM) - also 
known as the utility in Bayesian evidence design, risk or 
fitness - and optimisation then simply proceeds by select- 
ing the survey geometry which extremises (minimising or 
maximising where appropriate) the FoM 

The key to successful optimisation clearly lies in the 
criteria used to construct the FoM. Here we propose an 
FoM suited to our current (lack of) knowledge of dark en- 
ergy, and to our goals: to maximise discrimination power 
in terms of dark energy models and fundamental physics. 
Hence, our IPSO FoM is chosen to be an integral of some 
function, /, of the la covariances over the dark energy 
parameter space, 0, chosen by the user: 

FoM(s)= f I{s,6^)d6^. (I) 

J(s, M ) is a scalar which in general will depend on the 
survey geometry, the prior information available and po- 
sition in 0. We will discuss in the next section suitable 
choices for /. 

By integrating over the whole range of possible dark 
energy models we achieve model-independence. In gen- 
eral will be an n-dimcnsional space spanned by n dark 
energy parameters 6i,...9 n . A standard example would 



decent applications of FoM optimisation to problems in ex- 
periement design in astronomy and astrophysics can be found 
in [19]. 



be the space spanned by 9 = (wq,Wi, ...), the parame- 
ters entering a description of the dark energy equation 
of state w(z), but the parameter space could parametrise 
any quantity related to dark energy (scale factor, Hubble 
constant, distances, dark energy density, speed of sound, 
Compton wavelength etc.). 

Our choices for / will be based around the computa- 
tion, at each point in 0, of the expected la error bars 
for each of the n parameters spanning 0§. This can ei- 
ther be done using Fisher matrix techniques or e.g. by 
direct Monte Carlo Markov Chain (MCMC) simulation. 
The key point is that the simulated error bars used to 
compute the likelihood at each grid point will depend on 
the survey configuration, s, which is where the power to 
optimise the survey arises, whereas the prior precision 
matrix, denoted P, will be independent of the survey of 
course. 

Optimisation then consists of two parts. 

• For each survey configuration, s, compute FoM(s), 
given by eq. (I), a real number. 

• Extremise the FoM(s) over the set of survey ge- 
ometries using analytical, grid or Markov-Chain 
Monte Carlo (MCMC) methods 

We now discuss each of these aspects in turn. 

III. FOM FOR A GIVEN SURVEY GEOMETRY 

There are a number of ways of defining an appropriate 
Figure of Merit (FoM) , which for us reduces to a choice 
of I(s,9^) in cq. (1). Within the context of Bayesian 
optimal design this is a well-known problem with many 
different proposals having been made, depending on what 
the aim of the experiment is [20] . Here we consider three 
choices. 

The first effectively integrates the sum of the compo- 
nents of the la error- vector over and is in simplest form 
reduces to A-Optimality in Bayesian design. The second 
integrates the volume of the la error ellipse over while 
the third integrates the logarithm of the error ellipse 
volume and in simplest form reduces to D-Optimality, 
namely the maximisation of the Shannon information in 
going from prior to posterior. All of of these proposals 
can either include or exclude prior information as desired 
to yield Bayesian or non-Bayesian optimal solutions re- 
spectively. 

Before discussing the details we briefly remind the 
reader of our notation. The dark energy parameters are 
denoted M , spanning the n-dimensional space 0. The 



§ If prior information is to be used, the Fisher matrix follow- 
ing from the prior information (the prior precision matrix) is 
computed too. 
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various survey configurations being considered are de- 
noted by s. The Fisher matrix for the # M resulting from 
all prior cosmological information (the prior precision 
matrix) is denoted P. For example, it may be derived 
from CMB, LSS and SNIa data, as in [7]. Greek indices 
(fi,v...) label coordinates in 0; roman indices la- 
bel redshift bins. 



A. Fisher matrix integration & A-optimality 

Consider the covariance matrix, labelled C^, over 0. 
The n-th entry on the diagonal gives the variance in our 
knowledge of 9 n while the off-diagonal terms provide the 
covariances between the parameters. For nonlinear prob- 
lems C^y = C M i,(0 M ), the error bars in general depend on 
where one is in 0, the dark energy parameter space** 

We can then define a very general FoM to be max- 
imised via eq. (1) with the choice: 

l(s,0„)= (C- 1 (s,6 li )) llv W> ,v (0 li ) 

= tr{F W} (2) 

where tr{A} = Tr{A} denotes the trace of any matrix A 
(i.e. the sum of the diagonal terms) and the final equality 
is valid only for symmetric matrices such as the covari- 
ance matrix The (s) argument in equation (2) reminds us 
that the covariance matrix depends on the survey geome- 
try chosen. The second approximate equality comes from 
the Cramer-Rao bound which states that the inverse of 
the Fisher matrix provides the best possible covariance 
matrix, see e.g. [35]. The equality is nearly exact when 
the likelihood is nearly Gaussian' 1 " 1 ', defined by (e.g. [18]): 

where C is the likelihood. 

In eq. (2) W^ v is a real, positive and symmetric ma- 
trix over © which weights the importance of the vari- 
ous components of the covariance matrix, can be used to 
implement prior information, and can be chosen to opti- 
mally address specific questions, such as 'is dark energy 
dynamical?' (see subsection IV C). 

For example, if study shows that there are certain pa- 
rameters, say 64 and 6 , in the parameter space, which 
actually provide very little constraints on the underlying 



**For example, errors on both w ,w 1 roughly double if the 
underlying model is a cosmological constant rather than a 
w = -2/3, wi = model [22]. 

^Indeed, one can include almost-Gaussianity as a design 
criterion. 



physics, it is clear that we should not give them equal 
weight in designing the survey. Instead we can down- 
weight their importance in the final optimisation process 
by making W iu and W 6l/ smaller than the other compo- 
nents. In the limit where they are taken to zero, they 
will have no influence on the optimisation of the survey 
at all. 

The simplest choice is = 1, (Vfi, v); i.e. no priors, 
no dependence and all parameters weighted equally. 
In this case and when the Fisher matrix is diagonal, eq. 
(2) reduces to: 

n 

/MpHE^Mm) (4) 

i.e. the sum of the inverses of the expected variances of 
each of the dark energy parameters. Clearly we want to 
minimise the <r M and hence want to maximise expression 

(4) . 

However, instead of encoding prior information in W 
we can construct a slightly different /, which this time 
must be minimised: 

I(s,9^ = ((F( s ,8^+P)- 1 )^W^. (5) 

where P is the prior precision matrix and the Fisher Ma- 
trix is defined by (3). 

Eq. (5) is useful since it automatically weights the im- 
portance of prior information correctly. If the new sur- 
vey constraints (estimated by F fll/ (s, 9^)) are much better 
than current data (which are summarised in P) then the 
prior makes a negigible contribution to optimisation and 
visa versa. In this case it is natural to consider W^ v inde- 
pendent of 9^. If we further believe that every parameter 
in is equally useful (a reasonable starting point if 
has been judiciously chosen), then we can simplify eq. 

(5) even further by choosing W^ v = 5^ v to arrive at** 
the following expression to be minimised: 

7( S ,^) = tr{(F + P)- 1 } (6) 

where F depends both on s and 9^. Minimisation of (6) 
corresponds to A-optimality in the optimal design litera- 
ture [20] . However note that it does not give any impor- 
tance to the off-diagonal terms in the covariance matrix 
(F + P)- 1 . 

In this special case I reduces to 

n 

i( s ,e„) = ]T (a-% s ,e,) + a-DY 1 (7) 

where the (e,p) subscripts on the parameter variances 
a~ 2 denote the variances from the expected (e) sur- 
vey data and from existing, prior (p), data respectively. 



"The Kronnecker delta satisfies <5 M " = 1 if fj, = v and 
otherwise. 
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Again we see that if the new survey yields very small er- 
rors, as is expected for next-generation surveys, then cr~ 2 
is much larger than a~ 2 and dominates the maximisation 
process. 

For complete generality let us now consider alternative 
definitions of /. 



B. Average error ellipse volume, D-optimality and 
maximum entropy 

A slightly more elegant FoM can be defined, although 
it is rather more tricky to compute in nonlinear problems. 
In general we are trying to compute the survey config- 
uration that minimises the error bars on our underlying 
parameters. Since these parameters typically exhibit de- 
generacies (particularly true of distance measurements) 
an obvious optimisation proceedure is to minimise the 
volume of the n-dimcnsional ler error ellipse, averaged 
over 0. 

We can compute the volume of the error ellipse at any 
point in the parameter space via the square-root of the 
determinant of the covariance matrix. Equivalcntly one 
can consider the minimisation of the square of the vol- 
ume, V 2 tx det(C M1 ,) modulated by w{6 IJ ), a real- valued 
function that allows us to impose priors about what part 
of the parameter space is more important: 



(8) 



Although this is a natural FoM, with an immediate geo- 
metric interpretation, it has the disadvantage of not be- 
ing particularly easy to work with and, at least in this 
form, does not allow us to weight different parameters 
differently. 

Instead of trying to minimise (8) there are very good 
reasons to formulate this as a maximisation problem in- 
stead by considering: 



1(8,0^) =tS(0„)logdet(F + P) 



(9) 



Maximising the integral of this expression is known in the 
optimal design literature as Bayes D-Optimality. We do 
not explicitly write the dependence on (s, 6^) which we 
hope is obvious. Although maximising (9) does not have 
the immediate geometric interpretation that minimising 
(8) has, it has a very powerful interpretation: namely 
it is the FoM which maximises the gain in Shannon en- 
tropy (hence the logarithm) or information. Equivalcntly 
it maximises the expected Kullback-Leibler distance in 
going from the prior to the posterior. In other words, 
it ensures that one gets the most information boost over 
what one already had in hand from the prior data alone. 

While D-optimality is clearly very powerful it, and the 
minimisation of error-ellipse volume expressed by eq. (8), 
will tend to favour survey configurations in which one of 
the principle axes of the ellipse is very small (the thinnest 
ellipse), and hence may not be ideal in many situations. 



For example, imagine that there exists a survey con- 
figuration such that the variance cr| becomes vanishingly 
small at every point in 0. Minimising the FoM of eq. 
(8) would favour this survey configuration, but it might 
turn out that the corresponding parameter, 8s, which is 
measured with great accuracy, is actually of little use in 
constraining dark energy models. The survey would then 
provide wonderfully small error bars on an irrelevant pa- 
rameter and large error bars on relevant parameters. Of 
course this problem can be easily resolved by choosing 
decorrelated variables M that are physically useful. 

Which FoM is actually better for the needs of dark en- 
ergy cosmology requires further research. We note that 
in the case where the covariance matrices are not diag- 
onal, computing the inverse matrix is computationally 
intensive, so the relative CPU requirements of A- and 
D-optimality are not obvious. 



C. Other figures of merit 

There are many other choices for known collec- 
tively as ' alphabet '-optimality, including C-, G— and I- 
optimlity and even finer schemes such as D rm - and G rm - 
optimality [40], all optimising designs in different ways. 
For example one could choose to find the thinnest pos- 
sible error ellipses by maximising the biggest eigenvalue 
of the Fisher matrix. Alternatively one can minimise 
the maximum eigenvalue of the covariance matrix - C- 
optimality - which will favour small but circular error 
ellipses. 

This is also achieved by minimising the integral of: 



Tr{(F + P) 2 } 
(Tr{F + P}) 2 



- optimality (10) 



This expression is simply the ratio of the sum of the 
squares of all the entries of F + P divided by the square 
of the trace and is maximised by diagonal Fisher matri- 
ces which in turn indicate decorrelated parameters and 
circular error ellipses. 

It is even possible to construct uncountably many FoM 
using the fact that we can write the Fisher matrix (since 
it is square and symmetric) as the product E T AE where 
A = (Ai,...A„) are the eigenvalues and E is the set of 
eigenvectors of the Fisher matrix. Then we may define 
I p for p e [—oo, oo] as: 



I p {8ft, s) = max {A M } 

(£ A ?) 1/P 



for p = oo 
for p ^ 0, ±oo 



(IL^) 1 /" for p = 
min {A M } for p = — < 



(11) 
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D. CPU constraints on the various FoM 

Splitting the Fisher matrix into two factors: deriva- 
tives which depend on 9^ but are independent of Sj, and 
errors which depend on (and sometimes arc indepen- 
dent of 9^). Fortunately, the derivatives can be precom- 
puted, saving time in the final optimisation. 

Figure (2) shows the CPU requirements for the various 
FoM we have considered as a function of the dimensional- 
ity of the Fisher matrix (number of parameters). The key 
observation is that nusiance parameters have a profound 
impact (see inset of figure), removing any CPU advan- 
tage of one FoM over another. If no nuisance parameters 
are included then CPU considerations will favour Fisher- 
sum optimality and strongly disfavour A-optimality. The 
inclusion of nuisance parameters (the realistic case) es- 
sentially removes this disparity because of the required 
matrix inversions which are common. 

We now move to the issue of how to actually compute 
the optimal survey configuration once an FoM has been 
selected. 



FIG. 2. CPU constraints on the various FoM for case with- 
out (main figure) and with (inset) nuisance parameters that 
must be marginalised over. The main figure (inset) shows 
CPU times for 2000 (1000) realisations respectively. The var- 
ious curves are (a) Fisher-sum, (b) A-optimality, (c) Deter- 
minant and (d) D-optimality. Without nuisance parameters 
the Fisher-sum is significantly faster while A-optimality is the 
worst (since it is the only one that requires the inverse F _1 
and since det(F _1 = l/det(F)). When nuisance parameters 
must be marginalised over the playing field is much more even 
and there is little CPU advantage to any of the FoM. 

If the dimensionality of the survey configuration space, 
s, is small this can be done using grid techniques or if the 
dimensionality of s is large (e.g. in the case where one 
has or order 100 redshift bins) one can use Monte Carlo 
Markov Chain (MCMC) techniques [39] with a standard 
engine (e.g. Metropolis-Hastings), with the FoM playing 
the role of the likelihood in standard implementations. 
Otherwise one can consider other extremisation methods 
[41]. 



IV. OPTIMISING THE SURVEY 

Survey optimisation occurs by computing the FoM for 
a range of survey geometries and simply selecting the ge- 
ometry with the minimum/maximum FoM (depending 
on which of the candidate I's was chosen). In most situ- 
ations (where the problem is nonlinear in the parameters 
9fj,) analytical solutions for the FoM will not be available 
(although see below for special cases where there are) and 
instead we must compute it numerically. 
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A. Finding the optimal niche in a competitive 
market 

Optimising the survey in the presence of existing or 
expected data sets from other experiments is easy to im- 
plement within IPSO. Consider a situation of interest in 
the current climate of survey design: optimising a survey 
given the expected SNAP and Planck data. 

To find the optimal niche given these or any set of 
'competing' surveys we simply include, along with the 
prior information encoded in P, the expected Fisher ma- 
trices for all the other surveys, denoted J rSNAP , j^ planck 
etc... Hence one will now have: 

I(s, 9) = 0(F(s) + P + T SNAP + T planck + ...) (12) 

where e.g. 0(A) = tr(A _1 ) or logdet(A) depending on 
whether one chooses (6) or (9) for /, or a corresponding 
generalisation for any of the FoM considered in section 
(III). Of these terms, only the first - F - actually de- 
pends on the candidate survey geometry, s, although all 
terms (except P) will in general also change depending 
on position in the parameter space 0. The FoM will 
then include the information from SNAP, Planck and any 
other surveys included and extremising it will automat- 
ically select the survey geometry which provides the op- 
timal niche given the other surveys. This is a systematic 
approach to 'orthogonalising' error ellipses for different 
experiments. 

In the case that all the matrices in eq. (12) are diagonal 
(or can be made diagonal by a single similarity transfor- 
mation) then the A-optimal solution will be given simply 
by minimising the integral of trF -1 since all the other 
terms (trP -1 etc..) will simply add the same contribu- 
tion independent of the candidate survey geometry (by 
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linearity). In this case the weight matrix could be used to 
include prior information and expected results for other 
planned surveys. 

If on the other hand, at least one of the matrices in 
cq. (12) is not diagonal the inverse will not be a diagonal 
matrix in general and hence optimisation will depend on 
P etc... As a simple example consider the two parameter 
case (F,P etc... two-dimensional). The inverse is now 
inversely proportional to the determinant of the matrix 
sum in (12) and the mixing between the various matrices 
will remain after taking the trace. 

In the case of D-optimality, we deal directly with the 
determinant of the sum in (12). Hence, even in the di- 
agonal case the prior and SNAP/Planck etc... Fisher 
matrices make a crucial contribution to optimisation (for 
two or more parameters since otherwise the determinant 
is trivial). Consider again diagonal, two-dimensional F 
etc... so that / becomes: 

I(s, M ) = log(F n + P n + T^ NAP + ...) + 

\og(F 22 +P 22 +Tg NAP + ...) (13) 

All terms contribute to the final optimisation. We dis- 
cuss the optimal solution for this case in the section on 
analytical optimisation. 

Returning briefly to the general expression (12), we can 
understand it geometrically in a simple way in terms of 
generating mock data. At every point in we generate 
mock data for each of the competing surveys, add it to 
the prior data (which might be all current SNIa data 
and is the same at every point in of course). This is 
done only once. Then, for each candidate survey s we 
generate mock data at each point in for the survey 
being optimised. This data changes with s, the prior 
and competing data does not. One then computes the 
resulting FoM and extremises it. 

B. Dealing with nuisance parameters 



( ) = ^( F AB ) 



Marginalising over 
nuisance parameters 
in the Fisher matrix 




FIG. 3. IPSO integrates only over fundamental parameters 
9a while marginalising over the remaining nuisance parame- 
ters by extracting the appropriate sub-covariance matrix. 

Although we want to marginalise over the nuisance pa- 
rameters 9 a we do not (by definition of "nuisance" ) want 
them to play an active role in the optimisation process. 
Hence we wish to extract the Fisher matrix F a b after 
marginalising over the nuisance parameters. This pro- 
cess is shown in Fig. (3) and is described in detail in 
[22,18]. 

First, the full Fisher matrix, F^g, is inverted. The 
sub-matrix is extracted corresponding to the rows and 
columns of the fundamental parameters, yielding F~ b 
which can be used directly in the FoM (6) or inverted 
for use in (2) or (9). The existence or not of nuisance 
parameters one must marginalise over has a significant 
bearing on computation constraints as shown in figure 
(2). 

C. Addressing specific questions 

Survey designers may wish to optimise their survey 
in order to try to answer specific questions relevant for 
the day. This is particularly appropriate for experiments 
with short design and build lifetimes where one can be 
confident that the question of interest will not be an- 
swered before completion of the experiment. 

A good example is provided by one of the most pressing 
questions in dark energy research today, namely is the 
dark energy dynamical or is it a cosmological constant? 
It is plausible that by the time next-generation surveys 
reach final design state that there may still be no signal 
for dark energy dynamics (this will be the case if the 
true origin of acceleration is a cosmological constant). In 
which case we may be faced with the prospect of trying 
to detect dynamics (and hence rule out a cosmological 
constant) at the limit of the resolving power of even next- 
generation instruments. 

This question can be simply addressed: we want a sur- 
vey to discriminate between a cosmological constant and 
dynamical models. Model-discimination favours use of 
D-optimality. We need a FoM which yields a survey 
which provides optimally small error bars around the pa- 
rameter subspace corresponding to a cosmological con- 
stant (w = —1). This would require sacrificing accuracy 
in parameter regions far away from the cosmological con- 
stant which would be easily detected even with poor sen- 
sitivity in those regions. Such a prescription can be easily 
implemented by choosing the weight matrix W flu (9 IJ ) or 
weight-function w in eq. (9) appropriately. For example, 
if the cosmological constant corresponds to 9^ = for all 
H, then we could choose (for D-optimality) 

to(^)=expf-53^Wj (14) 

\ H-v / 
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For A-optimality one could choose W = iiU where every 
component of U is unity. 

For fi^y > this will exponentially suppress the con- 
tribution to the survey design from regions of parameter 
space far from the cosmological constant parameter sub- 
space, 8^ = 0. (3^ measures the aggressiveness with 
which the suppression is implemented in each direction 
in 0. But how should we choose (3^1 

We want strong suppresion in the case where the prior 
indicates that any deviations from ACDM, if they exist, 
are small and if many models with strong dynamics (i.e. 
far from ACDM) are already ruled out, i.e. if prior vari- 
ances on the variables are small which is equivalent to 
large entries in the prior precision matrix, P. One choice 
which implements this idea is: 



(15) 



where (f^ represents the parameter values of the target 
model we are trying to rule out (in this case ACDM and 
= 0)- 0ji 011 the other hand are the maximum like- 
lihood estimators of the # M using only the prior data. 
In practise this would benefit from softening to keep the 
components of (3^ finite in chance cases where one or 
more of the target parameters coincide to high accuracy 
with the prior best estimates of those same variables. 

What this choice does is the following: if the best-fit 
to prior data is close to the target model and if the prior 
parameter variances are small, then the suppression will 
be strong, reflecting the need to aggressively optimise the 
error bars near the target model. If, on the other hand, 
the prior-data best-fit is far from the target, or the data 
is very poor so the variances are large, then the resulting 
suppression is weak, i.e. small (3^. The expression (15) 
is appropriate whenever the target model corresponds to 
a single point in 0. 

In the case where one is trying to descriminate be- 
tween two classes of models corresponding to sub-spaces, 
IY2 € 0, which both have non-zero volume in 0, 
(Vol©(ri i2 ) > 0) a different suppression expression is 
needed since a single target point nolonger exists. In this 
case we propose to replace the denominator of (15) with 
an estimator of the minimum distance between the two 
sub-spaces. If the sub-spaces are far apart it will be much 
easier to differentiate them than if they are close to each 
other. Therefore we propose: 



p„ 



d(ri,r 2 ) ' 



(16) 



where 



d(Ti,r 2 ) = mm{d{6 1 ,6 2 ) | 9 X e Ti,6 2 € T 2 } (17) 

and where <i(#i,#2) gives the distance between any two 
points (6*i, #2) in the natural metric of (which will 



typically be a flat, Euclidean space). As an interest- 
ing aside, in the case where either subspace is discon- 
nected one can still use this expression although there 
may be more optimal choices. As an example of dis- 
connected subspaces consider the kink parametrisation 
of dark energy [7,38]. There models indistinguishable 
from ACDM correspond to two disconnected subspaces; 
namely (wq = —l,w m = —1) with (A, at) arbitrary on 
the one hand and (too = — 1> o-t < 10~ 4 ) and w m arbitrary 
(with A sufficiently large) on the other. 

For any of these prescriptions however, the extreme 
limit (3 ^ — > 00 yields a matrix of delta- functions for W^. 
In this case the integration over becomes trivial and 
optimising the survey collapses to finding the smallest 
error bars at the single chosen point, in the parame- 
ter space. Of course, building a survey to best address a 
specific question reduces the cross-section for serendipi- 
tous discovery. As an example, consider the issue of dark 
energy dynamics once again. 

Concentrating around the cosmological constant, as 
implied by eq. (14), may (depending on the parameter 
space 0) maximise sensitivity at intermediate redshifts 
(< 1) in an effort to detect deviations from the predic- 
tions of ACDM. As a result, the survey design may have 
no sensitivity at very high redshift z > 3 where dark en- 
ergy is irrelevant in the ACDM model and hence the abil- 
ity to make serendipitous discoveries (such as a sudden 
input of radiation or an oscillating dark energy equa- 
tion of state) is weak. Having said this, a sufficiently 
good choice of parameter space, 0, will allow for this 
possibility, since serendipitous discovery at high-z would 
automatically imply dynamics. 



D. Implementing survey constraints 

Each candidate survey geometry, indexed by s, will 
differ from other candidates in one of two conceptually 
different ways, which we label hard and soft differences. 
Hard differences correspond to discontinuous, radical, 
changes in survey structure. Tyically hard differences 
will be unique to each experiment and correspond to pro- 
posals to supplement the fiducial survey with a funda- 
mentally different component. The possibility of includ- 
ing a weak lensing survey in the SNAP mission strategy 
is a good example of a hard survey change [29] . 

Typical next-generation dark energy experiments will 
spend considerable time (1-5 years) surveying a consider- 
able fraction of the sky with high resolution and obtain- 
ing both spectra and photometry data at multiple fre- 
quencies. As a result many opportunities exist for over- 
lap between the various dark energy tests. For example, 
the proposed Dark Energy Survey [12] would obtain con- 
straints on dark energy from cluster counts, weak lensing, 
type la supernovae, baryon oscillations and extraction 
of the ISW effect through cross-correlation with Planck. 
The main question is whether spending time on the extra 
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component justifies the corresponding loss of constraints 
from the main part of the survey (in the case of SNAP, 
detection of SNIa) . 

On the other hand, soft differences in survey geometry 
correspond to smooth changes in the parameters under- 
lying the survey. For example, at fixed total observing 
time, what is the optimal observing time for each redshift 
bins? Or, what is the optimal split between survey area 
and survey depth? 

If hard differences are being considered then the out- 
put of the optimisation must answer whether they are 
justified and if so, what the optimal resulting soft config- 
uration should be ^ In a follow-up paper we will consider 
optimisation of the KAOS survey [11] given expected con- 
straints from the SNAP satellite. 

For linear problems (see e.g. section VB) it is possible 
to perform the optimisatation analytically. In most re- 
alistic cases, however, this will not be possible. If there 
are many different candidate survey geometries then it is 
likely that MCMC methods will be the prcfcrcd optimisa- 
tion technique. The disadvantage of the MCMC method 
is that it is not naturally suited to deal with both hard 
and soft differences, but only soft differences where the 
space of survey geometries is connected. In the case of 
hard differences it may not always be possible to make 
the space of survey geometries connected. In such a sit- 
uation grid or mixed grid-MCMC methods may have to 
be used. 

In most cases, however, hard differences can be simply 
accounted for by considering the combined space contain- 
ing the survey parameters for each of the components of 
the survey. For example, consider the case of SNAP and 
whether it should undertake a weak lensing survey (with 
'soft' parameters {Si}) in addition to the baseline SNIa 
survey (with 'soft' parameters {ji})- MCMC techniques 
could be used on the combined set {Si, fli} as long as the 
survey geometry with no weak-lensing occupies a finite 
volume in the full space (i.e. does not just correspond 
to a measure zero subspace) , otherwise the random sam- 
pling inherent in MCMC will never select the single sur- 
vey geometry. Clearly, implementation will depend on 
the specific survey under consideration. 

V. EXAMPLES OF OPTIMISATION 

Later we will consider how to construct the optimum 
error bars in a set of n redshift bins located at fixed red- 
shifts Zi, i = l...n with variable error bars, e^. First we 



consider the opposite problem: that of one-point optimi- 
sation. 



A. Where is the golden point? One-point 
optimisation 

If we are allowed to make one observation (with fixed 
error bar independent of redshift), then at what redshift 
should we place it? We will call this the golden point 
problem. Unlike true optimisation there is no constraint 
to implement (since we assume the error bar is indepen- 
dent of z). Hence the problem is much simpler and will 
allow us to get a feeling for some of the issues involved 
in optimisation. 

A simple analytical solution exists in the case where 
we also only consider a one dimensional parameter space, 
since then the various FoM become trivial. Further, 
in the case that the Fisher 'matrix', F, is independent 
of 9 then one can immediately see that the three pro- 
posed optimisations: A-optimality, maximising det(F) 
and D-optimality, all become equivalent. This follows 
since minimising F' 1 is clearly the same as maximising 
F = det(F) or logdet(.F) since the logarithm is a mono- 
tonic function. On the other hand, as soon as F = F{9), 
the three optimum solutions will no longer coincide in 
general. Clearly the golden solution will also depend on 
what quantity is being measured. 

We will consider two examples of one-point optimi- 
sation (see also [13] for a similar discussion but with- 
out the integration over 0). The first is to determine 
the H{z)-go\den point in a flat ACDM universe where 
the parameter is 6 = Q m , the matter density today and 
JIa = 1 — ■ We will consider for simplicity a measure- 
ment of the Hubble constant. Consideration of (Ia{z) or 
number counts is analagous. The Fisher point (the only 
component of the Fisher matrix), F is then written in 
terms of E(z) = H(z)/H as: 

where we have dropped normalisation constants which 
will not affect the optimisation (such as H and e, the 
redshift-independent error-bar). The game is then to find 
the value of z which extremises the corresponding FoM. 
The three optimisations then correspond to minimising 
J F _1 iiil m or maximising cither J F(Kl m or J log(F)dCl m 
(A-optimality, determinant and D-optimality respec- 
tively). 



§§ In reality, treating hard differences fully is nontrivial be- 
cause the resulting constraints will depend sensitively on en- 
gineering, manufacturing and design ingenuity in integrating 
the hard changes with the fiducial survey. 
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FIG. 4. Priors and parameter-space averaging af- 
fects the golden point. The one-point D-optimum red- 
shift for Hubble constant measurements is given by the zeros 
of the curves (a)-(e) for a constant w with different priors 
on the parameter space 9 = {w}. The various curves cor- 
respond to top-hat priors with upper and lower limits on w, 
i.e. ((wi ower , w U pper) of: (a) (-1.5,-0.8), (b) (-1.5,-0.6), 
(c) ACDM only, (d) (-1,-0.8) and (e) (-1,-0.6). The op- 
timum redshift varies from z op u = 0.86 (a) to z ov u = 2.6 
depending on the prior assumed. Allowing larger values of 
w increases the importance of the dark energy at higher red- 
shift and naturally shifts the golden point to higher redshift. 
Conversely, for more negative lower-bound forces the golden 
point to smaller redshifts and makes the change in the opti- 
mum sharper. 

Looking at eq. (18) we immediately see that assymp- 
totically F — > (1 + z) 3 for large z since the derivative of 
E w.r.t. Q m is monotonically increasing and hence F is 
maximised by simply going to the largest available red- 
shift. log(F) shares the same property so D-optimality 
makes the same prediction. 

This same conclusion is reached if we consider a dif- 
ferent example. Let us fix J7 m and choose w{z) so that 
w(z) — —1 for z < z* and w(z) = for z > 0. This is a 
crude model for a very rapid transition of the type that 
is actually a rather good fit to current SNIa data [38]. 
What is the golden redshift once we average over z* , the 
only parameter in the problem? A brief calculation con- 
firms the obvious: is determined by the expansion rate 
which is best measured at high redshift. Hence, perhaps 
counter-intuitively, the best way to detect a sudden tran- 
sition through H is by a measurement at high redshift 
(assuming redshift-independent error bars). 

Now let us consider an example in which the golden 
point is not the maximum redshift available to the survey. 
Again consider a single Hubble constant measurement 
but this time let us consider 9 = {w}, i.e. consider a 
constant equation of state parameter for the dark energy. 
We will again assume a flat universe and this time with 
a 70/30 split between the energy density of dark energy 
and matter today. None of these assumptions is crucial 
and in a full analysis one would include these parameters 
in to be integrated over in the optimisation. 

For a general parametrisation of the dark energy equa- 



tion of state we have 

E 2 = (fi m (l + zf + (1 - n m )f(z, dp) (19) 
where / controls the time evolution of the dark energy: 

/(*, M ) - exp(3 / l + W } 6 ^ dz) (20) 
J 1 + z 

In this case, we have / = (1 + z) 3 ^ 1+z \ Again the Fisher 
matrix is a single point, but this time the relevant deriva- 
tives are not necessarily monotonic. For a general single 
parameter expansion of / we have: 

f/E 2 is the ratio pde/ Ptot — &de- Since in most stan- 
dard dark energy models £Ide — > with increasing z we 
see that measuring dark energy parameters will favour 
observations at lower redshift. In the case of constant w 
the redshift-integral collapses to log(l + z). 

To compute the golden point we need to compute 
the FoM and then solve the equation dFoM/dz = 0. 
This is done most easily numerically. Figure (4) shows 
dFoM/dz as a function of z and the golden point cor- 
responds to the zero of this function. In the figure we 
consider only D-optimality defined by eq. (9) and ne- 
glect the prior precision matrix P. 

However, in computing the FoM we must integrate over 
w. But what range of w is reasonable to consider? The 
weak energy condition imposes w > — 1 and w > 1 seems 
unphysical. Such priors can be implemented through the 
weight function w(0) in eq. (9). What effect does impos- 
ing such theoretical priors have on the golden point? 

Figure (4) shows several different curves which differ 
only in the choice of w(9). In each case we choose it 
to be a top-hat function which is zero outside of the 
range (wi ower ,w U p per ). As we increase w upper we in- 
clude solutions into the optimisation which are more 
and more dynamically important at high redshift (since 
Pde (1 + z) 3 ( 1+w ') and hence the golden point moves 
to higher redshift. Conversely, if we decrease wi ower the 
dark energy becomes less and less apparent at high red- 
shift and the golden point rapidly shrinks towards zero. 

This simple example illustrates two key points: (1) Not 
integrating over the parameter space but simply taking 
a fiducial model such as ACDM (corresponding to curve 
(c) in fig. (4) looses a great deal of the potential to opti- 
mise a survey. (2) Given the infinite dimensional space of 
dark energy models, optimisation cannot be done with- 
out specifying the class of dark energy models one wants 
to detect. 

As a more sophisticated example, let us parametrise 
the dark energy energy density in terms of f(z) = 
Pde(z)pde(0) by considering n independent redshift 
bins centered for convenience at Zj = j 'A with 

k 

f(z k ) = l+J20j (22) 
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where Ok = f(zk) ~ f(zk-i) control the change in / be- 
tween bins. The WEC (w > —1) corresponds to 8j > 
for all j. Let us consider the golden-point problem with 
the slight simplification that we only consider putting our 
observation at one of the redshifts Zj. Note that the pa- 
rameters are perfectly correlated: a change in any of the 
9j can be compensated for by an opposite change in any 
of the other parameters. A short calculation shows that 
the Fisher matrix for this system is very special: 



F ab ( Zi ) cx E- 2 { Zl )M ab 



(23) 



where M a b = 1(0) if max(a,b) < {>)i- In other words, 
the Fisher matrix vanishes beyond the bin at which the 
observation is being made. As a result F is singular, 
det(F), reflecting the perfect correlation of the param- 
eters (the error ellipses are lines with zero volume). In 
fact this is a rather general property of one-point optimi- 
sation since one is constraining the function at a single 
point. This perfect correlation disappears when we have 
two or more measurements since one begins to test the 
shape of the function. 




2 3 4 5 

Bin number 

FIG. 5. Four different optimisations differing only by the 
values of {9 m i n ,9 max ). Negative 9 min pushes the optimum 
to lower redshift. However, due to the non-trivial integration 
structure the results are not obvious and there are multiple 
extrema. 



Because of this degeneracy we cannot apply A- 
optimality, D-optimality nor determinant optimality. 
However, Fisher-sum optimality will work, eq. (2), where 
we take all the entries of W a bto be equal top-hat func- 
tions, equal to unity for 9i € [0 m in,9 max ] and zero out- 
side. This allows us to impose prior constraints (such as 
implementing the WEC which implies B m % n > 0). 

The FoM then becomes: 



E 2 {z i) 6 j ) 



(24) 



where V = {0 max — 9 m i n ) and where we are trying to find 
the optimum value of i which determines the optimum 
redshift. 

Figure (5) shows the resulting FoM and optimal bin 
numbers (marked with arrows) as a function of i and 
show the strong dependence on the choice of 9 m i n and 
Qmax which are inputs from the survey designer. Again 
this highlights the need for precise design requirements. 



B. Analytical optimisation and linearity 

There are certain cases where the optimisation can be 
performed analytically. The standard example is the case 
where the covariance matrix and weight matrix W^ u are 
independent of the underlying parameters in which 
case the integration of any of the choices for / is trivial, 
yielding a weighted volume factor that is the same for all 
survey geometries and hence is irrelevant to optimisation. 
In this case, the problem reduces to one amenable to 
analytical solution using, e.g. Lagrange multipliers. 

Consider A-optimality, given by eq. (6), in the limit 
in which F and P are diagonal and the problem is linear 
so the Fisher matrix does not depend on the 9^. Under 
these approximations P makes no contribution to the 
optimisation and we have (c.f. eq. 7) 



N. 



^ _i_ (dxs 1 
^ * 2 \ de u 



(25) 



FoM cx ^2 a 



where N p is the number of data points or redshift bins 
in the survey and Xi = X(zi) is the underlying variable 
of interest (such as luminosity or angular diameter dis- 
tance). In general N p may well vary depending on the 
survey geometry, but here we consider it fixed. 

We are interested in finding the optimal experimen- 
tal error bars we should achieve for each redshift bin, ej, 
which as a group maximize the FoM. The ej are the di- 
rect output from each survey geometry, s. While in any 
specific application it is the (soft) survey parameters that 
we would vary and optimise over, for generality we will 
find the optimimum distribution. 



FoM(zi) cx 



^2 F a b(6j)dO\...dO n 
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FIG. 6. Schematic illustration of a survey geometry. For a 
given observable quantity of interest, X (such as luminosity 
distance) , the la error bars, , in each of several redshift bins 
(here i = 1—6) are free to vary subject only to a constraint 
such as eq. (26). 

Of course, without any further constraint, any reason- 
able FoM will be optimised by choosing ef = 0, Vi. 
Clearly we need to impose constraints such as cosmic 
variance, shot noise, finite resolution, systematic errors, 
finite observing time etc... As a model of such errors we 
optimise the FoM subject to the constraint: 

N 

/(^)=E(S-f)<°- (28) 

i=l U > 

The optimal solution saturates the bound. This is a 
simple encoding of the physical constraint that some 
weighted sum of the errors must be larger than a mini- 
mum, e* and ensures that the error in each bin is bounded 
from below by £j > (e./oj) 1 '", so that no one error bar 
can be made arbitrarily small (as would be allowed if we 
considered a linear sum of the Ci instead) . 

Here the a,i characterise the efficiency of observing in 
the ith bin while n quantifies the nonlinearity of the con- 
straint. It is typical that error bars at high-z are worse 
than those at lower redshift given equal resources (e.g. 
high-z objects are typically fainter and require longer ob- 
serving time to get a spectrum or a good light curve in the 
case of SNIa). Hence in this example we expect that the 
cti would be an increasing function as i (and z) increases 
expressing the increased cost of obtaining constraints at 
high redshift. An effective dependence of a, oc (1 + z) 6 
has been suggested in the literature for follow-up of SNIa 
[16]. 

A naive construction of a typical constraint is as fol- 
lows: assume the total observing time, T, for the sur- 
vey is fixed and we are able to devide up this time be- 
tween target objects in various redshift bins, appropriate 
in the case where target position and redshift are already 
known. The resulting error bars will typically scale as the 
square root of the observing time (or the number of ob- 
jects) in each bin and hence n = 2. However there is 



no solution for n = 2*** while for n < 2 we have only a 
minimum (i.e. the worst possible solution). For n > 2 
we have a maximum and we therefore focus on this case. 

We can now find the optimal distribution of error 
bars analytically using Lagrange multipliers. Consider 
the function y — FoM + A/(ei). The extremum of y 
also corresponds to the extremum of the FoM as long 
as we impose dy/dX = which enforces the constraint 
f{ei) = 0. Solving the resulting set of equations found 
from dy/dei = gives the minimum FoM and optimal 
error bars (for n > 2). For a single parameter (9^) we 
have: 

Ys '^ (f> {^)) 

Y is simply a normalisation that sets the overall scale 
of the error bars but is otherwise irrelevant. We remind 
the reader that n is determined by the constraint (26). 
The key point of this simplified solution is that it gives 
an intuitively reasonable answer: one should spend most 
of ones time getting small error bars (i.e. observing the 
most) where efficiency is high (small a.;) and where the 
resulting constraints on the parameters, 9^, are large, viz: 
where the ^p-are large. 

This solution is applicable (at least in the Fisher Ma- 
trix approximation) whenever the underlying quantity of 
interest (such as the distance) is linear in each parameter 
and we are only interested in soft changes to the survey 
(i.e. n and the oti are always the same). 

In the realistic case where we are optimising w.r.t. mul- 
tiple parameters (P is two or higher dimensional) equa- 
tion (28) is modified in a natural way (in each bin i we 
now have a sum over the n derivatives of X wrt. the 
parameters # M ). We do not give the formula explicitely 
since in most practical applications a numerical optimi- 
sation will be easiest and most efficient to implement. 

What does our alternative FoM, that is D-optimality, 
formulated in terms of the determinant give? Here we in- 
clude the prior precision matrix and expected Fisher ma- 
trix from SNAP (see eq.12). Consider the simplest case 
where the covariance matrices are diagonal with two pa- 
rameters and independent of 9^. In this case optimising 
the FoM is equivalent to optimising just the determinant 
without the logarithms in (13) or (9), since the logarithm 
is a monotonic function. Solving the resulting system for 



***For n — 2 the minimum occurs on the boundary of the 
allowed region but does not correspond to a point where the 
first derivatives vanish. For n > 2 such a local minimum does 
appear within the physical region allowed by the constraint. 
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two parameters subject to the same constraint, eq. (26) 
(which defines n), gives: 



det(F) = det(F )[l + e tr(Fi) + 0(e 2 )} 



(30) 



e.- = 



22j 



4 = P 4- P 4- T SNAP 



x Y 



(28) 



where Y is simply a constant normalisation as before. 
The generalisation to more than two parameters is trivial. 
The resulting optimum ej have the same basic structure 
as (28) - namely they are proportional to the on and in- 
versely proportional to a weighted sum of the derivatives 
of X with respect to the parameters 61,62- 

The main difference from diagonal Fisher matrices im- 
plementing A-optimality is that now the prior precision 
matrix and expected errors from SNAP, Planck etc... 
play an important part in determining optimal strat- 
egy. Naively one might expect that, everything else be- 
ing equal (ai, X-derivatives etc.), one should look where 
prior and competing experiment constraints are bad, i.e. 
in the 'data-desert'. Instead, at least in this case, it tells 
us that we should concentrate on the regions where con- 
straints are already good and where competing experi- 
ments will focus (since there is largest). 

As an interesting example, consider a baryon oscilla- 
tion survey such as KAOS [11,23,22,24], which would 
yield excellent constraints both on the Hubble 'constant' 
and the angular-diameter distance as a function of red- 
shift out to z — 3 and beyond. Which part of the red- 
shift range should KAOS focus? Considering the angu- 
lar diameter distance case first and the fact that SNAP 
and the currently highest known rcdshift SNIa go only 
to z = 1.7, it is clear that the region beyond z = 1.7 
will have = F^ v and so optimal error bars should be 
larger there. Indeed, there may be a good argument for 
not going beyond the optical, viz z = 1.3. 

The Hubble constant constraints are very different 
since there are essentially no current constraints and 
SNAP will give no direct constraints on it. A proper 
optimisation of KAOS would have to include both data 
sets and is left to the future. 



1. Small off-diagonal terms don't matter for D-optimality 

Most analytical work on covariance matrices (such as 
that above) assumes diagonality, i.e. that the covariances 
between parameters vanishe. However, we can derive an 
interesting result relevant to D-optimality when there arc 
small off-diagonal terms, viz 



F = F (I + eF!) , e«l 



(29) 



where here I is the unit matrix, F is diagonal and Fi 
carries only off-diagonal terms. Then we can solve per- 
turbatively (using the general relation In det C = tr In C 
for any matrix C) to find 



But since our splitting into Fo and Fi was done so that 
all the diagonal entries of Fi vanished we have: 



det(F) = det(Fo) +0(e 2 ) 



(31) 



Hence, the assumption of diagonality is a good one for 
D-optimality as long as the off-diagonal terms are reason- 
ably small. D-optimal solutions are stable under small 
non-diagonal perturbations of the covariance/Fisher ma- 
trices. 

Unfortunately the same does not hold for A-optimality 
since the inverse of eq. (30) will contain a term linear in e. 
Hence we can expect small non-diagonal terms to cause a 
bigger shift on A-optimal solutions than D-optimal ones. 
However since F is diagonal one can easily and system- 
atically compute corrections to the A-optimality FoM to 
any order in e. Systematically correcting optimal solu- 
tions for such perturbations is an interesting issue for 
further study. 



2. A simple example 

We end with a simple but relevant example where an- 
alytical optimisation applies. Consider the luminosity 
distance, X = cIl(z), Taylor expanded in some general 
basis functions Yi(z) which could be nonlinear in z (such 

as Yi = (1 - a y = j^y. 



d L (z) = d Q 
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! + 9{Y 1 (z) + -lY2{z) 



(32) 



The key point is that this formula is linear in the 6^, 
and hence the problem reduces to that in eq. (26). On 
the other hand, had we chosen to parametrise the dark 
energy equation of state w(z) or energy density p(z), then 
the dependence of di(z) on the 6^ would be nonlinear in 
6fj, and the covariance matrix would need to be integrated 
numerically over 0. 

This raises the question - which fundamental quan- 
tity is the best to use in designing a survey? Clearly 
avoiding integration over the parameter space is useful 
from a computational view, especially if the optimisa- 
tion can then be done in large part analytically. This can 
of course be achieved by using a parameter space which 
linearly parametrises the quantity of interest, as in the 
example above, although the connection to fundamental 
physics will not be clear (hence one is not clear that one 
is optimising with respect to fundamental physics). This 
interesting issue is left to the future. 



VI. CONCLUSIONS AND FUTURE WORK 

Mutual optimisation of surveys is important if we 
are to maximise the knowledge extracted from next- 
generation experiments about cosmology and the true 
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nature of dark energy in particular. Given the profound 
impact that understanding dark energy will have on fun- 
damental physics it is a worthy endeavour. In this paper 
we propose a framework for survey design and optimisa- 
tion which is model-independent (does not assume any 
specific underlying dark energy or cosmological model), 
maximises serendipitous discoveries, is flexible (the user 
can define their own parameter space to be integrated 
over) and allows optimisation of a survey given other 
'competing' surveys such as SNAP or Planck. In this 
way the best niche for a survey can be found precisely 
and quickly. 

This framework - Integrated Parameter Space Optimi- 
sation (IPSO) - is set-up within the context of Bayesian 
optimal design and naturally allows for best use of prior 
information in the design phase. The basic idea of IPSO 
is to extremise a Figure of Merit (FoM) - found by in- 
tegrating the la errors for that geometry over the entire 
dark energy parameter space - in the set of candidate 
survey geometries under consideration. 

We have considered three main FoM which correspond 
to minimising the average sum of parameter variances, 
minimising the average error ellipse volume and max- 
imising the gain in Shannon information in going from 
the prior to the posterior. The first and last FoM corre- 
spond to A-optimal and D-optimal solutions in Bayesian 
optimal design literature when the weight matrix takes 
on a special form. We have further shown that D-optimal 
and the minimum average volume solutions, obtained as- 
suming diagonal covariance matrices, are stable to small 
off-diagonal perturbations, while A-optimal solutions are 
not. 

The parameter space integration allows these FoM to 
be model-independent and to be sensitive to the entire 
gambit of dark energy models while mutual optimisation 
with respect to other surveys is achieved trivially by in- 
cluding their effects in the forecasting of the la errors. 
Optimisation then automatically chooses a survey con- 
figuration to "orthogonalisc" the resulting sets of con- 
straints and find the best niche for the survey being de- 
signed in the crowded market place. 

Some of the questions that will need to be addressed 
in future work is which FoM to use for dark energy stud- 
ies, how to optimally choose the parameter space and 
the weight matrix W^, especially to best answer spe- 
cific questions, such as whether dark energy is dynami- 
cal or not. The latter cannot be decided upon until 
is fixed. But even when this is done, choice of W^ v is 
highly non-trivial if one ventures away from unity. Also 
of interest is the numerical implementation of IPSO and 
how FoM change under a smooth change of 0. Detailed 
study should be undertaken to determine these interest- 
ing issues. A conservative first step is to use either the 
A-optimal or D-optimal expressions, equations (6) and 
(9) respectively. 

Nevertheless, the fact that optimised survey design is 
now a question worth addressing seriously reflects the 
rapid gain in maturity of modern observational cosmol- 



ogy and illustrates the coming profound shift to over- 
determined science where each of the inputs to cosmol- 
ogy is strongly constrained from multiple vantage points. 
The golden age of cosmology will be a show worth keep- 
ing ones eyes open for. 
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